17 research outputs found

    Carving model-free inference

    Full text link
    In many large-scale experiments, the investigator begins with pilot data to look for promising findings. As fresh data becomes available at a later point of time, or from a different source, she is left with the question of how to use the full data to infer for the selected findings. Compensating for the overoptimism from selection, carving permits a reuse of pilot data for valid inference. The principles of carving are quite appealing in practice: instead of throwing away the pilot samples, carving simply discards the information consumed at the time of selection. However, the theoretical justification for carving is strongly tied to parametric models, an example being the ubiquitous gaussian model. In this paper we develop asymptotic guarantees to substantiate the use of carving beyond gaussian generating models. In simulations and in an application on gene expression data, we find that carving delivers valid and tight confidence intervals in model-free settings.Comment: 50 pages, 2 figures, 7 Table

    Approximate selective inference via maximum likelihood

    Full text link
    This article considers a conditional approach to selective inference via approximate maximum likelihood for data described by Gaussian models. There are two important considerations in adopting a post-selection inferential perspective. While one of them concerns the effective use of information in data, the other aspect deals with the computational cost of adjusting for selection. Our approximate proposal serves both these purposes-- (i) exploits the use of randomness for efficient utilization of left-over information from selection; (ii) enables us to bypass potentially expensive MCMC sampling from conditional distributions. At the core of our method is the solution to a convex optimization problem which assumes a separable form across multiple selection queries. This allows us to address the problem of tractable and efficient inference in many practical scenarios, where more than one learning query is conducted to define and perhaps redefine models and their corresponding parameters. Through an in-depth analysis, we illustrate the potential of our proposal and provide extensive comparisons with other post-selective schemes in both randomized and non-randomized paradigms of inference

    Selective Inference with Distributed Data

    Full text link
    As datasets grow larger, they are often distributed across multiple machines that compute in parallel and communicate with a central machine through short messages. In this paper, we focus on sparse regression and propose a new procedure for conducting selective inference with distributed data. Although many distributed procedures exist for point estimation in the sparse setting, few options are available for estimating uncertainties or conducting hypothesis tests based on the estimated sparsity. We solve a generalized linear regression on each machine, which then communicates a selected set of predictors to the central machine. The central machine uses these selected predictors to form a generalized linear model (GLM). To conduct inference in the selected GLM, our proposed procedure bases approximately-valid selective inference on an asymptotic likelihood. The proposal seeks only aggregated information, in relatively few dimensions, from each machine which is merged at the central machine for selective inference. By reusing low-dimensional summary statistics from local machines, our procedure achieves higher power while keeping the communication cost low. This method is also applicable as a solution to the notorious p-value lottery problem that arises when model selection is repeated on random splits of data

    Exact Selective Inference with Randomization

    Full text link
    We introduce a pivot for exact selective inference with randomization. Not only does our pivot lead to exact inference in Gaussian regression models, but it is also available in closed form. We reduce the problem of exact selective inference to a bivariate truncated Gaussian distribution. By doing so, we give up some power that is achieved with approximate inference in Panigrahi and Taylor (2022). Yet we always produce narrower confidence intervals than a closely related data-splitting procedure. For popular instances of Gaussian regression, this price -- in terms of power -- in exchange for exact selective inference is demonstrated in simulated experiments and in an HIV drug resistance analysis.Comment: 42 pages, 8 Figure
    corecore